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Abstract 


Experimental  systems  in  biology  now  have  the  capability  to  produce  massive  amounts  of 
numerical  data.  Large  scale  analysis  of  such  data  is  facilitated  by  costly  integrated  software 
packages.  Often  however,  such  packages  have  limited  or  no  data  reduction  or  manipulation 
tools,  such  as  basic  spreadsheet  functionality.  Thus  the  researcher  is  compelled  to  utilize  a 
competent  spreadsheet  package,  such  as  Microsoft  Excel,  then  export  the  data  set  to  analysis 
software.  Reformatting  data  or  reducing  it  for  analysis  must  then  be  redone  in  the  spreadsheet 
after  each  analysis  run,  until  the  final  dataset  is  appropriately  formatted  for  the  analysis 
package.  In  order  to  facilitate  the  use  of  the  capabilities  of  the  spreadsheet  software  and 
perform  data  reduction  or  basic  analysis  without  having  to  switch  back  and  forth  between 
software  units,  we  are  building  a  set  of  analysis  tools  in  Excel.  One  of  the  most  useful  of  the 
tool  set  is  ChromaBlast,  which  normalizes  columnar  data,  sorts  the  data  into  user-selectable 
range-driven  bins,  develops  a  colour  heat  map  from  the  data,  and  outputs  the  heat  map  and  bin 
assortment  for  review.  Using  intrinsic  tools  in  the  spreadsheet  software,  output  data  can  be 
filtered  and  sorted  to  emphasize  data  patterns,  and  facilitate  rapid  data  review. 


Resume 


Les  systemes  exp6rimentaux  actuellement  disponibles  en  biologic  ont  la  capacite  de  produire 
des  donnees  numeriques  en  quantites  massives.  L’ analyse  a  grande  echelle  de  telles  donnees 
est  facilitee  par  des  progiciels  integres  couteux.  De  tels  progiciels  n’ont  cependant  qu’une 
capacite  limitee  de  depouillement  de  donnees  ou  d’outils  de  manipulation  telle  que  la 
fonctionnalite  elementaire  des  tableurs.  Le  chercheur  est  done  astreint  a  utiliser  des  progiciels 
tableurs  qui  soient  competents  tels  que  Microsoft  Excel,  puis  a  exporter  I’ensemble  des 
donnees  dans  un  logiciel  d’analyse.  Restruclurer  ou  depouiller  les  donnees  pour  leur  analyse 
doit  etre  effectue  a  nouveau  dans  le  tableur  apres  chaque  analyse,  jusqu’a  ce  que  fensemble 
final  des  donnees  soit  restructure  de  maniere  appropriee  pour  le  progiciel  d’analyse.  Pour 
faciliter  f  utilisation  des  capacites  du  logiciel  tableur  et  effectuer  le  depouillement  des  donnees 
ou  bien  f  analyse  elementaire  sans  avoir  a  faire  le  va  et  vient  entre  les  unites  de  logiciels,  nous 
avons  construit  un  ensemble  d'outils  d’analyse  dans  Excel.  Un  des  outils  les  plus  utiles  est 
ChromaBlast  qui  normalise  les  donnees  en  colonnes,  trie  les  donnees  dans  les  fichiers  definis 
par  des  plages  parametrables  par  I’utilisateur,  developpe  une  carte  thermographique  en 
couleur  et  execute  une  sortie  de  la  carte  thermographique  et  la  selection  des  fichiers  pour 
I’examen.  L’utilisation  d’outils  intrinseques  au  logiciel  tableur  permet  de  filtrer  les  donnees 
de  sortie  et  de  les  trier  pour  mettre  en  evidence  les  schemas  de  donnees  et  faciliter  I’examen 
rapide  des  donnees. 
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Executive  summary 


Background:  Recent  efforts  in  genomics  and  microarray-based  gene  expression  analysis  have 
led  to  massive  data  sets  which  require  enhanced  review  and  analysis  tools.  Many  commercial 
or  open  source  data  analysis  packages  do  not  have  have  adequate  spreadsheet  capabilities. 

For  example,  Statistica,  a  widely  used  comprehensive  statistical  package,  does  not  have  the 
capabilities  to  resort  data,  cut  and  paste,  or  search  and  replace  cells.  Spreadsheet  software, 
which  has  excellent  data  manipulation  features,  usually  has  only  basic  charting  and  statistics, 
often  requires  hand  entry  of  formulas,  and  multiple  steps  for  basic  analysis.  User-defined 
macros  are  a  useful  shortcut  system,  but  usually  have  insufficient  flexibility  for  repetitive, 
similar  but  different  data  sets.  Typical  analyses  of  large  data  sets  involve  multiple  iterations 
of  data  formatting,  pruning,  resorting,  and  reduction. 

In  order  to  begin  to  defeat  these  mixed  shortcomings,  we  are  developing  a  set  of  data 
reduction  and  analysis  tools  which  exploit  the  capabilities  of  existing  spreadsheet  software, 
while  enabling  data  review  and  analysis.  ChromaBlast  is  a  component  of  this  effort. 

Results:  A  data  visualization  tool,  ChromaBlast,  was  developed  to  facilitate  rapid  and 
intuitive  data  display,  and  to  assist  in  reducing  data  complexity  for  interpretation.  This  tool 
has  been  used  to  analyze  genomic  microarray  data  as  well  as  other  data  types. 

Significance:  Well  documented  data  visualization  tools  which  are  easy  to  use  are  lacking 
within  the  general  area  of  bioinformatics.  Using  the  Visual  Basic  for  Applications 
environment  for  Microsoft  Excel,  such  a  tool  has  been  developed  with  application  in 
bioinformatics  and  other  areas  where  visualization  and  rapid  interpretation  of  complex  data 
sets  are  required. 

Future  Directions:  Enhancements  to  the  existing  software  will  include  default  color  maps 
which  can  be  user  modified,  and  simpler  extraction  of  alphabetic  data  representations. 


Ford,  B.N.,  Shei,  Y.,  Bjamason,  S.,  Richardson,  C.  2006.  ChromaBlast  -  A  Data  Visualization 
Tool.  DRDC  Suffield  TM  2006-049.  Defence  R&D  Canada  -  Suffield. 
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Sommaire 


Contexte  :  Les  efforts  recents  en  genomique  et  en  analyses  d’expression  genetique  a  base  de 
microreseaux  ont  abouti  a  des  ensembles  massifs  de  donnees  requerant  un  examen  approfondi 
et  des  outils  d’analyse  ameliores.  Beaucoup  de  progiciels  d’analyse  de  donnees  commerciaux 
ou  de  source  non  secrete  ne  possMent  pas  de  capacites  en  tableur.  Statistica,  par  exemple,  un 
progiciel  de  statistique  comprehensif  tres  utilise,  ne  possede  pas  la  capacite  de  retrier  les 
donnees,  de  couper  et  coller  ou  de  rechercher  et  remplacer  les  cases,  Les  logiciels  tableurs  qui 
possedent  des  caracteristiques  excellentes  de  manipulation  des  donnees,  n’ont  normalement 
que  des  organigrammes  et  statistiques  de  base  et  exigent  sou  vent  d’entrer  manuellement  les 
formules  et  d’effectuer  de  multiples  etapes  pour  les  analyses  elementaires,  Les  macros 
configurees  par  rutilisateur  sont  un  systeme  de  raccourcis  utiles  mais  qui  ne  sont  pas 
suffisamment  souples  pour  les  ensembles  de  donnees  qui  sont  repetitifs  et  similaires  tout  en 
etant  differents.  Les  analyses  ordinaires  d’ensembles  importants  de  donnees  component  des 
iterations  de  formatage,  de  coupures,  retriage  et  de  depouillement  de  donnees. 

Pour  etre  en  mesure  de  commencer  a  combler  ces  differentes  sortes  de  lacunes,  nous  sommes 
en  voie  de  mettre  au  point  un  ensemble  d’ outils  de  d6pouillement  de  donnees  et  d’analyse  qui 
exploite  les  capacites  des  logiciels  tableurs  existants  tout  en  examinant  et  analysant  les 
donnees.  ChromaBlast  est  une  composante  de  cet  effort. 

Resultats  :  Un  outil  de  visualisation  de  donnees,  ChromaBlast  a  ete  mis  au  point  pour  faciliter 
I’affichage  rapide  et  intuitif  des  donnees  et  pour  aider  a  reduire  la  complexite  des  donnees  et 
mieux  les  interpreter.  Cet  outil  a  ete  utilise  pour  analyser  les  donnees  de  microreseaux 
genomiques  ainsi  que  d’autres  types  de  donnees. 

Portee  des  resultats  :  Le  domaine  general  de  la  bioinformatique  manque  d’outils  de 
visualisation  de  donnees  qui  aient  ete  bien  documentes  el  qui  soient  faciles  a  utiliser.  Un  tel 
outil  a  ete  mis  au  point,  en  utilisant  I’environnemenl  Visual  Basic  d’application  de  Microsoft 
Excel,  ayanl  une  application  en  bioinformatique  el  autres  domaines  ou  la  visualisation  et 
r interpretation  rapide  d’ensembles  complexes  de  donnees  sont  requises. 

Orientations  futures  :  L’ amelioration  des  logiciels  existants  inclura  des  cartes  en  couieurs 
par  defaut  qui  pourront  etre  modifiees  par  I’utilisaleur  et  une  extraction  simplifiee  des 
representations  des  donnees  alphabetiques. 


Ford,  B.N.,  Shei,  Y.,  Bjaranson,  S.,  Richardson,  C.  2006.  ChromaBlast  -  A  Data  Visualization 
Tool.  DRDC  Suffield  TM  2006-049.  R  &  D  pour  la  defense  Canada  -  Suffield. 
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Introduction 


Data  collection  from  biological  experimental  work  is  no  longer  limited  to  a  few  replicates  of  a 
single  enzymatic  assay.  Current  laboratory  efforts  can  produce  gene  expression  data  from  tens 
of  thousands  of  genes  and  dozens  of  replicates  in  a  few  days.  The  main  work  of  biological 
data  has  shifted  from  laboratory  effort,  towards  analysis  and  computational  effort. 
Unfortunately,  software  tools  for  this  work  have  not  in  general,  kept  pace  with  technological 
progress.  What  tools  are  available  commercially  are  costly,  minimally  functional,  and  seem 
to  remain  in  a  permanent  state  of  beta  development,  with  poorly  behaving  interfaces,  overly 
complex  controls,  and  nonexistent  customer  support.  Conversely,  software  developed  for 
generic  functions,  such  as  spreadsheets,  tend  to  have  poorly  documented  algorithms  for 
analysis,  unvalidated  statistical  tools,  and  insufficient  capacity  for  large  datasets. 

Examples  of  these  problems  can  be  found  in  locally  deployed  software.  Microsoft  Excel  is  a 
full  function,  high  quality  spreadsheet  system  for  generic  data.  In  a  furst  pass  gene  expression 
experiment,  23,000  genes  were  analyzed  in  multiple  replicates,  at  multiple  time  points.  The 
data  set  is  comprised  of  some  360  individual  arrays  of  23,000  intensity  signals.  The  default 
installation  of  Excel  cannot  contain  the  entire  dataset  in  one  spreadsheet.  Development  of  an 
enhancement  to  Excel  using  VBA  (not  documented  here)  was  required  in  order  to  support  the 
entire  dataset.  J-Express,  a  "fully  functioned"  gene  expression  microarray  package,  is  able  to 
load  the  entire  dataset,  but  none  of  the  analysis  or  display  elements  function  when  the  entire 
dataset  is  loaded. 

The  usual  solution  to  these  issues  is  the  employment  of  multiple  software  packages, 
attempting  to  exploit  their  capabilities,  while  working  around  the  deficiencies.  In  order  to 
begin  to  solve  these  issues  for  ongoing  work,  development  of  data  visualization  and  analysis 
tools  using  the  programming  interface  of  MS  Excel  has  been  undertaken.  ChromaBlast,  a  tool 
for  normalization  and  visualization  of  data  has  been  developed.  ChromaBlast  is  a  stand  alone 
component  designed  for  microarray  work,  but  which  can  be  used  to  look  at  various  data  types. 
This  report  describes  the  functions,  interface,  and  output  of  ChromaBlast. 
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Materials  and  Methods 


Software  Code 

ChromaBlast  was  coded  in  Visual  Basic  for  Applications  (VBA)[1,2],  which  is  a 
progranaming  language  designed  to  connect  the  macro  programming  functions  of  Excel  with 
relatively  sophisticated  manual  coding  in  a  development  environment.  Preliminary  macros 
(keystrokes  and  program  functions  recorded  while  being  used)  can  be  used  to  quickly 
template  a  specific  piece  of  work.  Using  the  macro  record  as  a  framework,  generalization, 
functional  options,  and  user  input  can  be  implemented.  Within  VBA,  a  multitude  of  functions 
can  also  be  added  which  cannot  be  accessed  via  the  macro  recording  process.  Thus  the 
progrannmnier  can  quickly  template  a  concept  with  macros,  then  add  function  and  usability 
inside  the  VBA  environment.  VBA  can  also  be  used  to  develop  code  from  scratch,  like  any 
typical  programming  language. 

Through  a  number  of  iterations  involving  the  scientists  who  use  the  final  product,  a  relatively 
simple  tool  was  developed,  which  nevertheless  has  enormous  functionality,  and  incidentally 
has  properties  during  analysis  which  were  not  obvious  in  the  design  phase.  ChromaBlast  is 
either  installed  alone  or  as  a  component  of  a  larger  suite  called  BioTools.XLA.  The  code  for 
ChromaBlast  is  at  Annex  1.  The  code  presented  uses  the  BioTools  front  menu,  but  the 
ChromaBlast  functionality  is  contained  in  the  included  script. 

To  use  ChromaBlast,  the  user  preselects  (highlights)  the  range  of  cells  for  the  subroutine,  then 
selects  the  number  and  colour  of  bins  to  be  applied.  With  that  information,  the  subroutine 
then: 

1 .  labels  the  selected  range  as  a  range,  called  Analysis. 

2.  adds  the  sheet  "binsetup"  from  the  BIOTOOLS  addin. 

3.  reassigns  the  RGB  value  of  the  56  available  colours  within  the  current  file.  (Note:  the 
new  colours  become  part  of  the  information  held  within  the  file). 

4.  counts  the  number  of  bins  assigned  by  the  user. 

5.  creates  an  array  containing  the  alphabetical  labels  assigned  by  the  user. 

6.  determines  the  number  of  Columns  and  Rows  in  the  Analysis  range. 

7.  moves  to  first  column  within  the  range,  selects  the  data  in  the  first  column. 

8.  determines  the  minimum  and  maximum  values  within  each  column. 

9.  using  the  minimum  and  maximum  values  creates  the  number  of  bins  specified  by  the 
user  on  the  "binsetup”  sheet. 

10.  assigns  each  data  value  to  a  bin. 

11.  assigns  a  letter  value  to  each  data  value. 

12.  pastes  the  letter  value  for  each  data  cell  into  a  new  column  to  the  right  of  the  current 
data  column. 

13.  loads  the  data  letter  value  into  an  array 

14.  pastes  the  array  of  letters  as  a  block  two  columns  to  the  right  of  the  last  data  column. 

15.  colours  each  cell  in  the  array. 

16.  moves  the  array  of  letters  to  the  right,  leaving  behind  the  block  of  coloured  cells. 
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Results 


ChromaBlast  was  designed  to  facilitate  rapid  comparison  of  microarray  data,  without  the  prior 
requirement  for  statistical  analysis.  It  proves  to  be  useful  for  analyzing  disparate  data  types. 
Multiple  numerical  data  models  ranging  from  microarray  data,  population  statistics,  and 
global  temperature  data  have  been  analyzed  here  with  ChromaBlast.  Comparison  of  large 
value  ranges  within  columns  is  straightforward,  since  the  binning  strategy  is  intrinsically 
normalizing  between  columns.  Notably,  data  with  letter  values  or  large  numbers  of  null 
values  or  zeros  are  not  appropriate  for  direct  analysis  with  ChromaBlast.  Conversion  of  nulls 
and  letters  to  numeric  representative  values  (which  could  represent  one  extreme  value)  could 
be  a  useful  formatting  strategy. 

Figure  1  represents  a  data  excerpt  from  a  microarray  genomic  fingerprinting  experiment, 
attempting  to  discriminate  between  various  bacterial  species  and  strains.  Within  the  five  main 
columns  are  represented  hybridization  patterns  of  two  species,  B.  anthracis  (left  most)  and 
E.  coli.  The  primary  differentiation  at  this  level  is  whether  or  not  the  samples  exhibit  different 
hybridization  patterns,  or  very  similar  ones.  No  attempt  to  present  statistical  support  for  the 
analysis  is  given  at  this  point.  2000  data  points  from  each  of  5  arrays  are  represented.  Even 
though  the  columnar  values  range  widely  (e.g.  column  1  ranges  from  1  to  59,717,  column  3 
from  1  to  29,178),  comparison  of  the  normalized  data  differentiates  Bacillus  anthracis  from 
Escherichia  species  without  difficulty.  Column  E  representing  chip  feature  numbers  1601- 
2000  is  a  region  where  the  Bacillus  sp.  exhibits  a  very  different  pattern  from  the  E.coli  strains. 
Other  areas  such  as  Column  A  do  not  show  very  different  patterns  of  hybridization.  IT  is 
clearly  necessary  to  be  able  to  see  a  large  amount  of  the  data  set  at  once  in  order  to  discern  the 
main  areas  of  difference.  Simple  examination  of  the  numerical  values  would  not  be  a 
practical  comparison  strategy.  ChromaBlast  in  this  case  allows  an  extreme  reduction  in  data 
complexity  for  simple  inspection. 

Figure  2  illustrates  the  application  of  ChromaBlast  to  a  dataset  of  yearly  crime  data  from  the 
United  States  Department  of  Justice  (http://www.ojp.usdoj.gov/bjs/dtdata.htm).  Represented 
are  data  from  1960  to  2003,  including  total  census  population  estimates.  The  left  most 
column  indicates  the  population  data  for  the  range  of  years,  and  effectively  recapitulates  the 
color  pattern  in  the  heat  map.  In  the  ChromaBlast  processed  color  map,  it  is  apparent  from 
the  left  most  color  column  (map  position  corresponds  to  spreadsheet  cell  position)  that  the 
population  is  increasing  with  time,  and  essentially  reproduces  the  defined  heat  map.  The  peak 
population  occurs  in  bins  for  years  2001-3.  Conversely,  it  can  be  seen  that  crime  patterns  do 
not  match  the  population  growth.  Property  crime  actually  reached  its  peak  incidence  and  rate 
in  the  years  1980-81,  while  the  murder  incidence  and  rates  peaked  in  1991,  and  has  been 
declining  in  both  absolute  frequency  and  rate  ever  since.  Indeed,  all  the  crime  rates  shown 
have  been  in  decline  since  the  early  1990s.  These  data  could  be  displayed  graphically,  but 
representing  all  of  these  sources  on  a  single  chart  would  be  rather  confusing.  The 
ChromaBlast  representation  is  simple  and  compelling. 

As  a  further  example  of  the  usefulness  of  this  tool,  we  analysed  global  meteorological  data  for 
temperature  during  the  period  1880  to  2004  (Figure  3).  This  particular  dataset  is  currently 
highly  politicized  and  controversial  in  the  media,  and  is  of  broad  public  interest,  in  Figure  3, 
the  heat  map  color  pattern  is  recapitulated  in  the  a  column  (population),  increasing  towards 
the  bottom  (bright  yellow). 
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Figure  1,  ChromaBlast  of  first  2000 
features  of  a  genotyping  microarray  dataset 
Data  are  unfiltered. 
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Figure  2.  Crime  data  for  United  States,  1960-2003.  Source: 
h  ttp://bjsdata.  ojp.  usdoj.  gov/dataonline/Search/Crime/State/statebystatelist  cfm. 
The  left  most  column  (population)  recapitulates  the  heat  map  pattern. 
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1896 


Figure  3.  Global  temperature  anomalies  from  1881  to 
2004,  versus  base  period  1951-1980. 

Source :  http://data.giss.nasa.gov/ 
gistempAabledata/GLB.  Ts.txt. 

Values  are  in  0.01  increments. 
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Discussion 


Using  microarray  technology,  one  can  assay  thousands  of  features  at  once,  in  each  sample. 

A  microarray  in  this  application  is  a  microscope  slide  onto  which  are  spotted  several  thousand 
individual  DNA  probe  sequences,  each  one  of  which  can  detect  unique  fragments  of  labelled 
DNA.  Using  DNA  probes  specific  to  known  microbial  sequences,  one  can  identify  with  high 
confidence  the  species  and  probably  the  strain  of  organism  under  examination.  Such  a  tool  is 
a  useful  complement  to  existing  PCR,  RFLP,  or  AFLP  technologies.  Unfortunately,  the  data 
from  microarray  experiments  is  difficult  to  work  with,  often  containing  thousands  of  values, 
different  scaling  between  experiments  (i.e.  between  microarrays),  relatively  noisy  variation 
within  and  between  experiments,  and  complex  patterns  of  expression  [3,4].  Rapid  and  facile 
analysis  tools  are  required  with  which  one  can  ^rform  simple  data  review  and  comparison 
[5].  ChromaBlast,  running  under  the  Windows™  environment  in  Excel™,  is  part  of  the 
answer  to  this  problem. 

ChromaBlast  assigns  a  color  and  letter  code  to  values  within  each  column,  dividing  the 
columnar  data  into  bins  of  equal  size,  distributed  evenly  over  the  range  of  column  values.  The 
maximum  and  minimum  values  are  used  to  establish  the  range  of  values  in  each  bin.  The 
effect  of  this  strategy  is  that  wide  ranges  of  values  in  adjacent  columns  will  end  up  with  color 
maps  which  are  automatically  scaled  for  the  total  range  of  the  data  in  the  column.  Thus 
datasets  which  are  scaled  differently  (e.g.  from  different  instruments  or  data  logging  units) 
can  be  compared  very  easily  without  employing  scaling  or  normalization  functions. 

The  letter  code  simply  represents  an  alphabetic  coding  for  the  bins.  The  alphabetic  code  could 
in  principal  be  entirely  arbitrary.  The  alphabetic  code  was  developed  for  future  exploitation  in 
analysis  using  existing  algorithms  for  comparing  and  aligning  alpha  datasets,  such  as 
Needleman-Wunsch  [6],  the  basis  for  fast  comparisons  in  genomic  databases. 

ChromaBlast  is  useful  for  genomic  or  gene  expression  microarray  data  review.  The  binning 
strategy  intrinsically  normalizes  the  data,  enabling  comparisons  between  quite  different 
microarray  sets.  An  unanticipated  benefit  of  ChromaBlast  is  the  effect  of  using  bins  which 
are  asymmetrically  distributed.  This  is  achieved  by  simply  assigning  the  same  color  to 
adjacent  bins  (color  are  not  mutually  exclusive).  This  proves  to  be  useful  when  datasets 
contain  an  abundance  of  near-background  values,  or  where  values  across  a  certain  proportion 
of  the  data  are  over  dispersed,  often  observed  in  microarray  datasets.  Because  ChromaBlast 
assigns  bin  value  ranges  based  on  the  upper  and  lower  values  in  the  column,  datasets  with 
frequent  zero  or  small  values  will  tend  to  have  a  large  frequency  of  low-heat  color 
assignments.  If  the  dataset  is  deficient  in  middle  range  values,  assigning  more  than  one  bin 
the  same  color  in  the  middle  range  can  reduce  the  data  complexity  substantially. 

This  tool  was  explicitly  designed  for  microarray  analysis,  but  can  be  readily  applied  to  any 
data  which  have  a  similar  pattern  of  sampling  (e.g.  iterative  sampling  over  time  in  multiple 
replicates).  Digitized  data  from  graphical  displays  (such  as  spectrometers)  could  be  compared 
in  this  way.  Digital  data  collected  over  long  time  periods  can  be  easily  summarized  on  a 
single  graphical  display.  The  pattern  of  crime  statistics  shown  in  Figure  2  are  an  example  of 
this. 
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ChromaBlast  also  has  subtle  display  properties  which  may  reveal  issues  within  datasets.  The 
global  temperature  anomaly  data  in  Figure  3  is  an  interesting  example  (source  http://data.giss. 
nasa.gov/gistemp/tabledata/GLB.Ts.txt.).  A  20-year  periodicity  of  positive  temperature 
anomalies  is  apparent,  with  rising  global  maxima  around  1900,  1920,  1940  and  1960,  which 
subside  within  2-4  years.  Another  global  maximum  might  be  predicted  around  1980. 

Notably,  the  values  recorded  around  1980  are  indeed  20  year  maxima  based  on  the  prior 
years,  but  do  not  appear  to  decline  again  within  a  short  period.  Indeed,  the  temperature 
anomalies  after  1980  appear  to  continue  to  increase.  Also  discernible  is  that  certain  months  of 
the  year  seem  to  anticipate  future  trends.  The  temperature  anomalies  recorded  in  January 
reflect  the  annual  global  temperature  anomalies,  but  also  exhibit  a  long-range  rising  trend, 
which  is  recapitulated  in  annual  averages  with  a  multi-year  lag. 

It  is  apparent  that  ChromaBlast  has  value  in  complexity  reduction  and  intuitive  review  of  a 
diversity  of  data  types.  Future  improvements  will  include  a  wider  range  of  bins  (currently 
ChromaBlast  is  limited  to  26  bins),  automatic  bin  assignment  (with  user  modification 
optional)  using  generally  optimized  color  maps,  and  streamlined  output  of  the  alphabetic 
values  for  other  analysis  methods. 
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Annex  A 


ChromaBlast  Code 


Attribute  VB_Name  =  ’‘ChromaBlast" 

Option  Base  1 

Dim  strMessage  As  String,  strButtons  As  String,  strTitle  As 
String 

Dim  intResponse  As  Integer 
Const  APPNAME  =  "ChromaBlast I " 

Sub  StartChromaBlast ( ) 

Dim  intColourIndex  As  Integer 

Dim  strMsg  As  String 
Dim  strAns  As  String 

strMsg  =  "Have  you  selected  the  data  you  wish  to 
ChromaBlast? " 

strAns  =  MsgBox ( strMsg ,  vbQuestion  +  vbYesNo,  APPNAME) 

If  strAns  =  vbNo  Then  Exit  Sub 


On  Error  Resume  Next 

ActiveWorkbook. Names . Item( "Analysis" ) .Delete  'delete  the  range 
if  it  already  exists 

'create  the  range 

ActiveWorkbook .Names .Add  Name : = "Analysis " , 

RefersTo: =Selection 

'Add  binsetup  sheet  from  BIOTOOLS  Addin 
Application . DisplayAlerts  =  False 
ChromaBlastV3 . Sheetl . Copy  _ 

After : =ActiveWorkbook . Sheets (ActiveWorkbook . Sheets . Count ) 
Sheets ( "binsetup" ) . Select 
'  ActiveWindow . SelectedSheets .Delete 
Application . DisplayAlerts  =  True 

'set  Workbook  Colours  with  pretty  display 

'White  to  Black 

intColourIndex  =  1 

For  intColourIndex  =  1  To  56 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(0,  0,  0) 

If  intColourIndex  <  56  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (77,  77,  77) 

End  If 

If  intColourIndex  <  55  Then 

ActiveWorkbook , Colors ( intColourIndex)  =  RGB (119,  119,  119) 
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End  If 

If  intColourIndex  <  54  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (150,  150,  150) 
End  If 

If  intColourIndex  <  53  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (192,  192,  192) 
End  If 

If  intColourIndex  <  52  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (221,  221,  221) 
End  If 

If  intColourIndex  <  51  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(255,  255,  255) 
End  If 

Next  intColourIndex 
' Violet 

intColourIndex  =  43 

For  intColourIndex  =  43  To  49 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (93,  0,  126) 

If  intColourIndex  <  49  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (115,  0,  156) 

End  If 

If  intColourIndex  <  48  Then 

ActiveWorkbook. Colors (intColourIndex)  =  RGB (164,  0,  202) 
End  If 

If  intColourIndex  <  47  Then 

Act iveWorkbook . Colors ( intColourIndex)  =  RGB (204,  0,  255) 

End  If 

If  intColourIndex  <  46  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (210,  121,  255) 
End  If 

If  intColourIndex  <  45  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (215,  175,  255) 
End  If 

If  intColourIndex  <  44  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (232,  209,  255) 
End  If 

Next  intColourIndex 
' Indigo 

intColourIndex  =  35 

For  intColourIndex  =  35  To  42 

Act iveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  46,  138) 

If  intColourIndex  <  42  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  0,  182) 

End  If 

If  intColourIndex  <  41  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(0,  0,  224) 

End  If 

If  intColourIndex  <  40  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (73,  73,  255) 
End  If 

If  intColourIndex  <  39  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(129,  129,  255) 
End  If 

If  intColourIndex  <  38  Then 
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ActiveWorkbook. Colors (intColourIndex)  =  RGB(171,  171,  255) 
End  If 

If  intColourIndex  <  37  Then 

ActiveWorkbook. Colors (intColourIndex)  =  RGB (217,  217,  255) 
End  If 

Next  intColourIndex 
'  Blue 

intColourIndex  =  28 

For  intColourIndex  =  28  To  35 

ActiveWorkbook. Colors (intColourIndex)  =  RGB { 0 ,  102,  102) 

If  intColourIndex  <  35  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  128,  128) 

End  If 

If  intColourIndex  <  34  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(0,  153,  153) 
End  If 

If  intColourIndex  <  33  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  214,  209) 
End  If 

If  intColourIndex  <  32  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (3,  255,  255) 
End  If 

If  intColourIndex  <  31  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB(129,  255,  255) 
End  If 

If  intColourIndex  <  30  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (197,  255,  255) 
End  If 

Next  intColourIndex 
' Green 

intColourIndex  =  21 

For  intColourIndex  =  21  To  28 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  110,  0) 

If  intColourIndex  <  28  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  153,  0) 

End  If 

If  intColourIndex  <  27  Then 

ActiveWorkbook . Colors { intColourIndex)  =  RGB ( 0 ,  220,  0) 

End  If 

If  intColourIndex  <  26  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB ( 0 ,  254,  0) 

End  If 

If  intColourIndex  <  25  Then 

Act iveWorkbook . Colors ( intColourIndex)  =  RGB(141,  255,  141) 

End  If 

If  intColourIndex  <  24  Then 

ActiveWorkbook . Colors ( intColourIndex)  =  RGB (187,  255,  187) 

End  If 

If  intColourIndex  <  23  Then 

ActiveWorkbook. Colors (intColourIndex)  =  RGB(225,  225,  225) 
End  If 

Next  intColourIndex 
' Yellow 
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intColourIndex  =  14 
For  intColourIndex  =  14  To  21 
Ac t iveWorkbook . Colors ( intColourIndex )  = 
If  intColourIndex  <  21  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  20  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  19  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  18  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  17  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  16  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

Next  intColourIndex 
' Orange 

intColourIndex  =  7 
For  intColourIndex  =  7  To  14 
Act iveWorkbook . Colors { intColourIndex)  = 
If  intColourIndex  <  14  Then 
Act iveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  13  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  12  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  11  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  10  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  9  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

Next  intColourIndex 
■  Red 

intColourIndex  =  1 
For  intColourIndex  =  1  To  7 
ActiveWorkbook . Colors ( intColourIndex)  = 
If  intColourIndex  <  7  Then 
ActiveWorkbook . Colors { intColourIndex) 
End  I  f 

If  intColourIndex  <  6  Then 
ActiveWorkbook . Colors ( intColourIndex) 


RGBdlO,  107,  0) 
RGB(182,  178,  0) 

=  RGB(214,  209,  0) 

=  RGB(238,  232,  0) 

=  RGB(255,  255,  7) 

=  RGB(255,  255,  167) 


RGB(255,  255,  205) 


RGB(210,  85,  0) 
RGB{255,  102, 

=  RGB(255,  137, 

=  RGB(255,  162, 

=  RGB(255,  183, 

=  RGB(255,  206, 

=  RGB(255,  238, 


0) 

19) 

69) 

111) 

157) 

221) 


RGB (118,  0,  0) 
RGB(168,  0,  0) 

=  RGB{224,  0,  0) 


DRDC  Suffield  TM  2006-049 


13 


End  If 

If  intColour Index  <  5  Then 
ActiveWorkbook. Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  4  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  3  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

If  intColourIndex  <  2  Then 
ActiveWorkbook . Colors ( intColourIndex) 
End  If 

Next  intColourIndex 
End  Sub 


RGB(255, 

RGB(255, 

RGB{255, 

RGB(255, 


1,  1) 

107,  121) 

159,  161) 

213,  221) 


Sub  PreviewAssignedColours { ) 

Dim  intCellCount  As  Integer,  intCounter  As  Integer 
Dim  ArrayColourBins  As  Variant 

'count  cells  in  Bin  range  to  determine  the  number  of  bins 

Range ( "bins" ) .Select 

intCellCount  =  0 

For  Each  xCell  In  Selection 

If  xCell. Value  >  0  Then  intCellCount  =  intCellCount  +  1 
Next  xCell 


'create  array  of  Labels 

On  Error  Resume  Next  ' incase  no  bins  were  selected 


ReDim  ArrayColourBins ( intCellCount ,  2) 
intCounter  =  1 
For  Each  xCell  In  Selection 
If  xCell. Value  >  0  Then  'if  the  cell  has  a  value 
ArrayColourBins ( intCounter ,  1)  =  xCell . Of f set ( -1 ,  0) .Value 
'put  the  label  into  the  array 

ArrayColourBins ( intCounter ,  2)  =  xCell. Value  'put  the  colour 
value  into  the  array 

xCell . Interior . Colorindex  =  ArrayColourBins ( intCounter ,  2) 
'colour  the  cell 

intCounter  =  intCounter  +  1  ' increment  the  counter 
Else 

xCell . ClearFormats 
End  If 
Next  xCell 

'dump  the  array  values  into  cells  to  prove  they  were  picked  up 
'  Range (" a20 "). Select 
'  Range (ActiveCell -  Of f set ( 0 ,  0), 

ActiveCell . Of f set ( intCellCount  -  1,  1)). Value  =  ArrayColourBins 


End  Sub 
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Sub  CCBins ( ) 

Dim  intRowCount  As  Integer,  intColCount  As  Integer,  intBinCount 
As  Integer 

Dim  BinArray  ’hold  bin  values 

Dim  SequenceArray  As  Variant  '  hold  sequence  values 

Dim  intCounter2  As  Integer  '  counter  to  move  from  column  to 

column 

Dim  intCounter  As  Integer 
Dim  ArrayColourBins  As  Variant 
Dim  varValue  As  Variant 

Dim  varCheckBin  As  Variant  'compares  array  values 


'count  cells  in  Bin  range  to  determine  the  number  of  bins 
Range ( " bins " ) . Select 
intBinCount  =  0 
For  Each  xCell  In  Selection 
If  xCell. Value  >  0  Then  intBinCount  =  intBinCount  +  1 
Next  xCell 


’ create  array  of  Labels 

On  Error  Resume  Next  ' in  case  no  bins  were  selected 


ReDim  ArrayColourBins ( intBinCount ,  2) 
intCounter  =  1 
For  Each  xCell  In  Selection 
If  xCell. Value  >  0  Then  'if  the  cell  has  a  value 

ArrayColourBins ( intCounter ,  1)  =  xCell . Of f set ( -1 ,  0) .Value 

'put  the  label  into  the  array 

ArrayColourBins ( intCounter ,  2)  =  xCell. Value  'put  the  colour 
value  into  the  array 

xCell . Interior . Colorindex  =  ArrayColourBins ( intCounter ,  2) 
'colour  the  cell 

intCounter  =  intCounter  +  1  ' increment  the  counter 
Else 

xCell . ClearFormats 
End  If 
Next  xCell 


'dump  the  array  values  into  cells  to  prove  they  were  picked  up 
’ test  to  this  point  to  make  sure  it  works 
'  Range ("a4") .Select 
'  Range (ActiveCell . Of f set ( 0  ,  0), 

ActiveCell . Of f set ( intBinCount  -  1,  1)). Value  =  ArrayColourBins 


On  Error  Resume  Next 

ActiveWorkbook . Names . Item ( "myRange " ) . Delete 
ActiveWorkbook . Names . I tern ( " CurrentRange " ) .Delete 
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■go  to  Analysis  range 

Application , Goto  reference : = "Analysis " 

'count  range  dimensions 

intRowCount  =  Selection . Rows . Count  -  1  '-1  because  of 
offset 

intColCount  =  Selection . Columns . Count 
' set  counter2  for  use  with  column  looping 
intCounter2  =  1 

'Redim  create  sequencearray  to  size  of  selection 

ReDim  SequenceArray ( intRowCount  +  1,  intColCount  +  1) 

'go  to  first  column  in  selection,  insert  column 
Range (ActiveCell . Of f set ( 0 ,  1), 

ActiveCell . Of f set ( intRowCount ,  1)) .Select 
ActiveCell . EntireColumn , Insert 
Selection . Columns . ColumnWidth  =  4 
' loop  through  columns 
For  intCounter2  =  1  To  intColCount 
'select  the  range 
Range (ActiveCell . Of f set ( 0 ,  -1), 

ActiveCell . Of f set ( intRowCount ,  -1)) .Select 
'create  the  range 

Ac tiveWorkbook .Names .Add  Name : = " myRange " , 

Ref ersTo : =Selection 

'  find  the  min  and  max  value  of  myRange 

myMin  =  WorksheetFunction . Min ( Range ( "myRange ") ) 
myMax  =  WorksheetFunction . Max ( Range ( "myRange ") ) 

'  create  the  bin  values  for  myRange 
ReDim  BinArray ( intBinCount ,  1) 
intCounter  =  1 

For  intCounter  =  1  To  intBinCount 
BinArray ( intCounter ,  1)  =  (((myMax  -  myMin)  /  intBinCount) 

*  intCounter)  +  myMin 
Next  intCounter 

'select  the  column  beside  active  selection  and  call  it  Current 
Range 

Range (Act iveCell . Of f set ( 0 ,  1), 

Act iveCell . Of f set ( intRowCount ,  1)) .Select 

Ac tiveWorkbook . Names . Add  Name : = "Current Range " , 

Ref ersTo := Select ion 

Set  rng  =  Range ( "CurrentRange" ) 

' fill  each  cell  with  bin  formula 
For  Each  xCell  In  Selection 

varValue  =  xCell . Of f set (0 ,  ^1) .Value  'check  the  value  of 
the  cell  to  the  left 

intCounter  =  1  ' set  the  counter 
varCheckBin  =  BinArray ( intCounter ,  1)  'set  the  initial 

value  before  the  loop  begins 
Do 

'sets  the  cell  value  to  first  label,  then  loops  through 
checking  the  varValue  against 

'varCheckBin  array  until  the  value  is  less  than  or  equal. 
xCell. Value  =  ArrayColourBins ( intCounter ,  1) 

If  intCounter  >  intBinCount  Then  Exit  Do 
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varCheckBin  =  BinArray ( intCounter ,  1) 
intCounter  =  intCounter  +  1 
Loop  Until  varValue  <=  varCheckBin 
Next  xCell 


'place  values  in  Current  Range  into  the  SequenceArray  looping 
through  each  cell 

For  intCounter3  =  1  To  intRowCount  +  1 

SequenceArray ( intCounter3 ,  intCounter2)  = 
rng . Cells ( intCounter3 ,  1) .Value 
Next  intCounter! 

' repeat  for  next  range 

Range (ActiveCell . Of f set ( 0 ,  2), 

ActiveCell . Of f set ( intRowCount ,  2)) .Select 
ActiveCell . EntireColumn . Insert 
Selection . Columns . ColumnWidth  =  4 


Next  in t Counter 2 

'move  over  and  fill  with  Array  info 

Range (ActiveCell .Of f set ( 0 ,  1) ) .Select 
Range (ActiveCell . Of f set ( 0,  0), 

Act iveCell . Of f set ( intRowCount ,  intColCount) ) .Value  = 
SequenceArray 

Range (ActiveCell . Of f set ( 0 ,  0), 

ActiveCell . Of f set ( intRowCount ;  intColCount  -  1)). Select 
' set  column  widths 

Ac t iveWorkbook . Names . Add  Name :  = " Current Range " , 

Ref ersTo ; = Select ion 

Range (" CurrentRange" ). Columns . ColumnWidth  =  4 
'colour  the  cells 

For  Each  xCell  In  Selection 
varValue  =  xCell. Value 

intCounter  =  1  ' set  the  counter 

varCheckBin  =  ArrayColourBins ( intCounter ,  1)  'set  the 
initial  value  before  the  loop  begins 

xCell . Interior -Colorlndex  =  ArrayColourBins ( intCounter ,  2) 
Do 

'sets  the  cell  value  to  first  label,  then  loops  through 
checking  the  varValue  against 

'varCheckBin  array  until  the  value  is  equal. 

If  varValue  =  varCheckBin  Then  Exit  Do 

varCheckBin  =  ArrayColourBins ( intCounter ,  1) 

xCell . Interior . Colorlndex  =  ArrayColourBins ( intCounter , 

2) 

intCounter  =  intCounter  +  1 
Loop  Until  varValue  =  varCheckBin 
Next  xCell 

'clear  the  contents  of  the  cells 
Selection .ClearContents 

'place  SequenceArray  beside  Coloured  Bins  (as  per  Y's  request) 
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'move  over  and  fill  with  Array  info 

' Range (ActiveCell . Of f set ( 0 ,  (2  *  intColCount)  +  1)), 

ActiveCell .Of fset (intRowCount ,  -1)) .Select 

Range (ActiveCell . Of f set ( 0 ,  intColCount  +  1), 
ActiveCell . Of fset ( intRowCount ,  (2  *  intColCount ))). Select 

Range (ActiveCell . Of fset ( 0 ,  0) , 

ActiveCell . Of fset { intRowCount ,  intColCount) ) .Value  = 
SequenceArray 

Range (ActiveCell . Of fset ( 0 ,  0), 

ActiveCell . Of fset ( intRowCount ,  intColCount  -  1)). Select 
’ set  column  widths 

Ac t iveWorkbook . Names .Add  Name : = "CurrentRange " , 

Ref ersTo : =Selection 

Range ( "CurrentRange "). Columns . ColumnWidth  =  4 
End  Sub 

Sub  CleanWorkbook ( ) 

'clean  up  workbook 
On  Error  Resume  Next 
Application. DisplayAlerts  =  False 
ActiveWorkbook . Names . Item ( "myRange " ) . Delete 
Act  iveWorkbook,  Names  .  Item  ( "  CurrentRange '' )  .  Delete 
Act iveWorkbook. Names . Item( "Analysis " ) . Delete 
Sheets ( "binsetup" ) .Delete 
Application . DisplayAlerts  =  True 


End  Sub 
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List  of  symbol s/abbreviations/acronyms/initialisms 


AFLP 

amplified  fragment  length  polymorphism 

DNA 

deoxyribonucleic  acid 

PCR 

polymerase  chain  reaction 

RFLP 

restriction  fragment  length  polymorphism 

RNA 

ribonucleic  acid 
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Glossary 


fingerprint 

a  collection  of  signal  intensity  scores,  digitized  from  an  image  of  a 
hybridization  of  genomic  DNA  to  a  microarray  spotted  with  DNA 
fragments.  The  fingerprint  of  a  given  species  and  strain  is  unique 
from  that  of  other  species  or  strains. 

gene 

a  DNA  sequence  which  encodes  a  single  genetic  trait  which  is 
inherited  by  offspring 

genomic  DNA 

the  DNA  which  comprises  the  genetic  material  of  an  cell,  and  is 
inherited  by  the  progeny  of  the  cell.  The  so-called  blueprint  of  life. 

The  sequence  of  nucleotides  in  the  genomic  DNA  comprises  the 
genes,  and  detennines  the  properties  of  the  microbe.  For  many 
microbes,  the  sequence  of  the  genomic  DNA  is  in  the  public  domain. 

hybridization 

sample  DNA  (or  RNA)  is  tagged  with  a  fluorescent  dye,  then  applied 
to  the  surface  of  the  microarray.  Under  controlled  conditions, 
sequences  in  the  sample  DNA  which  correspond  to  sequences  in  the 
microarray  features,  will  bind  to  the  features  (hybridize). 

Hybridization  often  refers  to  the  entire  process  from  labeling  to 
binding,  to  post  incubation  washing. 

microarray 

a  microscope  slide,  filter  membrane  or  other  solid  surface,  onto 
which  DNA  fragments  have  been  spotted  in  an  organized  grid.  Each 
spot  is  called  a  feature. 

nucleotide 

the  components  of  DNA  are  the  nucleotides  deoxyadenosine 
monophosphate,  deoxycytidine  monophosphate,  deoxyguanosine 
monophosphate,  deoxythymidine  monophosphate,  and  the  chemical 
bonds  which  join  them  into  long  chains.  Genetic  information  is 
encoded  in  the  order  in  which  the  nucleotides  occur  in  the  DNA 

chain. 

oligonucleotide  (oligo) 

a  fragment  of  DNA  (or  RNA)  chemically  synthesized,  and  often 
representing  some  section  of  genetic  material  from  which  the 
sequence  is  already  known.  Oligos  may  also  be  “random”  in 
sequence,  such  that  the  oligo  sequence  is  not  intentionally  derived 
from  known  DNA  sequences 

species 

the  grouping  of  microbes  according  to  major  genetic  differences  (e.g. 
the  ability  to  grow  (or  not)  in  an  oxygen-free  environment) 

strain 

a  microbe  which  differs  from  other  members  of  the  same  species  by 
minor  or  additional  genetic  characters  (e.g.  resistance  or  sensitivity 
to  penicillin). 
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